Prescriptive Process Monitoring systems recommend, during the execution of a business process, interventions that, if followed, prevent a negative outcome of the process. Such interventions have to be reliable, that is, they have to guarantee the achievement of the desired outcome or performance, and they have to be flexible, that is, they have to avoid overturning the normal process execution or forcing the execution of a given activity. Most of the existing Prescriptive Process Monitoring solutions, however, while performing well in terms of recommendation reliability, provide the users with very specific (sequences of) activities that have to be executed without caring about the feasibility of these recommendations. In order to face this issue, we propose a new Outcome-Oriented Prescriptive Process Monitoring system recommending temporal relations between activities that have to be guaranteed during the process execution in order to achieve a desired outcome. This softens the mandatory execution of an activity at a given point in time, thus leaving more freedom to the user in deciding the interventions to put in place. Our approach defines these temporal relations with Linear Temporal Logic over finite traces patterns that are used as features to describe the historical process data recorded in an event log by the information systems supporting the execution of the process. Such encoded log is used to train a Machine Learning classifier to learn a mapping between the temporal patterns and the outcome of a process execution. The classifier is then queried at runtime to return as recommendations the most salient temporal patterns to be satisfied to maximize the likelihood of a certain outcome for an input ongoing process execution. The proposed system is assessed using a pool of 22 real-life event logs that have already been used as a benchmark in the Process Mining community.
translated by 谷歌翻译
流程数据可用性的兴起最近导致了数据驱动的学习方法的发展。但是,这些方法中的大多数限制了学习模型的使用来预测正在进行的过程执行的未来。本文的目的是向向前迈出一步,并利用可用的数据来学习采取行动,通过支持用户的最佳策略(绩效衡量)的建议。我们采用一个过程参与者的优化视角,我们建议下一步执行的最佳活动,以响应在复杂的外部环境中发生的事情,而外源性因素没有控制。为此,我们研究了一种通过强化学习来学习的方法,从观察过去的执行中学习的最佳政策,并建议开展最佳活动,以进行优化关键的兴趣指标。该方法的有效性在从现实生活数据中获取的两种情况下得到了证明。
translated by 谷歌翻译
业务流程偏差是指业务流程执行的子集的现象,以消极或积极的方式偏离{他们的预期或理想的结果。业务流程的偏差执行包括违反合规规则的人,或者欠冲前或超过绩效目标的执行。偏差挖掘涉及通过分析支持业务流程的系统存储的事件日志来揭示揭示异常执行的原因。在本文中,首先通过基于顺序和声明模式模式的特征和它们的组合来研究解释业务流程的偏差问题。然后,通过基于纯数据属性值和数据感知声明规则利用事件日志中的事件日志和迹线的数据属性来进一步提高说明。然后通过用于规则感应的直接和间接方法来提取表征消化的解释。使用来自多个域的实际日志,根据他们准确地区分过程的非偏差和异常执行能力以及决赛的可理解性的能力来评估一系列特征类型和不同形式的决策规则。返回给用户的结果。
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
许多可解释性工具使从业人员和研究人员可以解释自然语言处理系统。但是,每个工具都需要不同的配置,并提供不同形式的解释,从而阻碍了评估和比较它们的可能性。原则上的统一评估基准将指导用户解决中心问题:哪种解释方法对我的用例更可靠?我们介绍了雪貂,这是一个易于使用的,可扩展的Python库,以解释与拥抱面枢纽集成的基于变形金刚的模型。它提供了一个统一的基准测试套件来测试和比较任何文本或可解释性语料库的广泛最先进的解释器。此外,雪貂提供方便的编程摘要,以促进新的解释方法,数据集或评估指标的引入。
translated by 谷歌翻译
Anomaly Detection is a relevant problem that arises in numerous real-world applications, especially when dealing with images. However, there has been little research for this task in the Continual Learning setting. In this work, we introduce a novel approach called SCALE (SCALing is Enough) to perform Compressed Replay in a framework for Anomaly Detection in Continual Learning setting. The proposed technique scales and compresses the original images using a Super Resolution model which, to the best of our knowledge, is studied for the first time in the Continual Learning setting. SCALE can achieve a high level of compression while maintaining a high level of image reconstruction quality. In conjunction with other Anomaly Detection approaches, it can achieve optimal results. To validate the proposed approach, we use a real-world dataset of images with pixel-based anomalies, with the scope to provide a reliable benchmark for Anomaly Detection in the context of Continual Learning, serving as a foundation for further advancements in the field.
translated by 谷歌翻译
Algorithms and technologies are essential tools that pervade all aspects of our daily lives. In the last decades, health care research benefited from new computer-based recruiting methods, the use of federated architectures for data storage, the introduction of innovative analyses of datasets, and so on. Nevertheless, health care datasets can still be affected by data bias. Due to data bias, they provide a distorted view of reality, leading to wrong analysis results and, consequently, decisions. For example, in a clinical trial that studied the risk of cardiovascular diseases, predictions were wrong due to the lack of data on ethnic minorities. It is, therefore, of paramount importance for researchers to acknowledge data bias that may be present in the datasets they use, eventually adopt techniques to mitigate them and control if and how analyses results are impacted. This paper proposes a method to address bias in datasets that: (i) defines the types of data bias that may be present in the dataset, (ii) characterizes and quantifies data bias with adequate metrics, (iii) provides guidelines to identify, measure, and mitigate data bias for different data sources. The method we propose is applicable both for prospective and retrospective clinical trials. We evaluate our proposal both through theoretical considerations and through interviews with researchers in the health care environment.
translated by 谷歌翻译
We consider the problem of two active particles in 2D complex flows with the multi-objective goals of minimizing both the dispersion rate and the energy consumption of the pair. We approach the problem by means of Multi Objective Reinforcement Learning (MORL), combining scalarization techniques together with a Q-learning algorithm, for Lagrangian drifters that have variable swimming velocity. We show that MORL is able to find a set of trade-off solutions forming an optimal Pareto frontier. As a benchmark, we show that a set of heuristic strategies are dominated by the MORL solutions. We consider the situation in which the agents cannot update their control variables continuously, but only after a discrete (decision) time, $\tau$. We show that there is a range of decision times, in between the Lyapunov time and the continuous updating limit, where Reinforcement Learning finds strategies that significantly improve over heuristics. In particular, we discuss how large decision times require enhanced knowledge of the flow, whereas for smaller $\tau$ all a priori heuristic strategies become Pareto optimal.
translated by 谷歌翻译
Explanations are crucial parts of deep neural network (DNN) classifiers. In high stakes applications, faithful and robust explanations are important to understand and gain trust in DNN classifiers. However, recent work has shown that state-of-the-art attribution methods in text classifiers are susceptible to imperceptible adversarial perturbations that alter explanations significantly while maintaining the correct prediction outcome. If undetected, this can critically mislead the users of DNNs. Thus, it is crucial to understand the influence of such adversarial perturbations on the networks' explanations and their perceptibility. In this work, we establish a novel definition of attribution robustness (AR) in text classification, based on Lipschitz continuity. Crucially, it reflects both attribution change induced by adversarial input alterations and perceptibility of such alterations. Moreover, we introduce a wide set of text similarity measures to effectively capture locality between two text samples and imperceptibility of adversarial perturbations in text. We then propose our novel TransformerExplanationAttack (TEA), a strong adversary that provides a tight estimation for attribution robustness in text classification. TEA uses state-of-the-art language models to extract word substitutions that result in fluent, contextual adversarial samples. Finally, with experiments on several text classification architectures, we show that TEA consistently outperforms current state-of-the-art AR estimators, yielding perturbations that alter explanations to a greater extent while being more fluent and less perceptible.
translated by 谷歌翻译
In 2021 300 mm of rain, nearly half the average annual rainfall, fell near Catania (Sicily island, Italy). Such events took place in just a few hours, with dramatic consequences on the environmental, social, economic, and health systems of the region. This is the reason why, detecting extreme rainfall events is a crucial prerequisite for planning actions able to reverse possibly intensified dramatic future scenarios. In this paper, the Affinity Propagation algorithm, a clustering algorithm grounded on machine learning, was applied, to the best of our knowledge, for the first time, to identify excess rain events in Sicily. This was possible by using a high-frequency, large dataset we collected, ranging from 2009 to 2021 which we named RSE (the Rainfall Sicily Extreme dataset). Weather indicators were then been employed to validate the results, thus confirming the presence of recent anomalous rainfall events in eastern Sicily. We believe that easy-to-use and multi-modal data science techniques, such as the one proposed in this study, could give rise to significant improvements in policy-making for successfully contrasting climate changes.
translated by 谷歌翻译